35 research outputs found

    A General Framework for Learning Mean-Field Games

    Full text link
    This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population. It first establishes the existence of a unique Nash Equilibrium to this GMFG, and demonstrates that naively combining reinforcement learning with the fixed-point approach in classical MFGs yields unstable algorithms. It then proposes value-based and policy-based reinforcement learning algorithms (GMF-V and GMF-P, respectively) with smoothed policies, with analysis of their convergence properties and computational complexities. Experiments on an equilibrium product pricing problem demonstrate that GMF-V-Q and GMF-P-TRPO, two specific instantiations of GMF-V and GMF-P, respectively, with Q-learning and TRPO, are both efficient and robust in the GMFG setting. Moreover, their performance is superior in convergence speed, accuracy, and stability when compared with existing algorithms for multi-agent reinforcement learning in the NN-player setting.Comment: 43 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1901.0958

    Sample Efficient Reinforcement Learning with REINFORCE

    Full text link
    Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works have either required exact gradients or state-action visitation measure based mini-batch stochastic gradients with a diverging batch size, which limit their applicability in practical scenarios. In this paper, we consider classical policy gradient methods that compute an approximate gradient with a single trajectory or a fixed size mini-batch of trajectories under soft-max parametrization and log-barrier regularization, along with the widely-used REINFORCE gradient estimation procedure. By controlling the number of "bad" episodes and resorting to the classical doubling trick, we establish an anytime sub-linear high probability regret bound as well as almost sure global convergence of the average regret with an asymptotically sub-linear rate. These provide the first set of global convergence and sample efficiency results for the well-known REINFORCE algorithm and contribute to a better understanding of its performance in practice.Comment: Accepted to AAAI 2021. Fixed typos in constants and enriched the literature revie

    MFGLib: A Library for Mean-Field Games

    Full text link
    Mean-field games (MFGs) are limiting models to approximate NN-player games, with a number of applications. Despite the ever-growing numerical literature on computation of MFGs, there is no library that allows researchers and practitioners to easily create and solve their own MFG problems. The purpose of this document is to introduce MFGLib, an open-source Python library for solving general MFGs with a user-friendly and customizable interface. It serves as a handy tool for creating and analyzing generic MFG environments, along with embedded auto-tuners for all implemented algorithms. The package is distributed under the MIT license and the source code and documentation can be found at https://github.com/radar-research-lab/MFGLib/

    The Contagion Effect of Compensation Regulation: Evidence From China

    Get PDF
    To shed light on whether and how firms changed compensation practices in response to a shift in the environment in which they operated, we examine whether there is contagion effect of executive compensation regulation on state-owned enterprises (SOEs) in the emerging market of China. Specifically, we investigate whether firms not directly affected by the changing regulatory environment nonetheless changed executive compensation in response to the actions of the directly affected firms, which is called contagion effect. We further examine the specific contagion mechanisms and the economic consequences of regulation on compensation. We find that the regulation has a significant effect on compensation gap in central SOEs and a contagion effect on local SOEs but not for non-SOEs. Within SOEs, there is an intra-industry contagion effect of compensation regulation but not an intra-region effect. Further, central SOEs and local SOEs experience reduced firm performance after the compensation regulations, but not the non-SOEs; indicating that the compensation regulation does not have favorable economic consequences for both the directly affected central SOEs and the indirectly affected local SOEs

    Cu^{2+}-Chelating Mesoporous Silica Nanoparticles for Synergistic Chemotherapy/Chemodynamic Therapy

    Get PDF
    In this study, a pH-responsive controlled-release mesoporous silica nanoparticle (MSN) formulation was developed. The MSNs were functionalized with a histidine (His)-tagged targeting peptide (B3int) through an amide bond, and loaded with an anticancer drug (cisplatin (CP)) and a lysosomal destabilization mediator (chloroquine (CQ)). Cu2+ was then used to seal the pores of the MSNs via chelation with the His-tag. The resultant nanoparticles showed pH-responsive drug release, and could effectively target tumor cells via the targeting effect of B3int. The presence of CP and Cu2+ permits reactive oxygen species to be generated inside cells; thus, the chemotherapeutic effect of CP is augmented by chemodynamic therapy. In vitro and in vivo experiments showed that the nanoparticles are able to effectively kill tumor cells. An in vivo cancer model revealed that the nanoparticles increase apoptosis in tumor cells, and thereby diminish the tumor volume. No off-target toxicity was noted. It thus appears that the functionalized MSNs developed in this work have great potential for targeted, synergistic anticancer therapies
    corecore